CHAPTER 18 A Yes-or-No Proposition: Logistic Regression 265
As shown in Figure 18-7, the ROC curve always starts in the lower-left corner of
the graph, where 0 percent sensitivity intersects with 100 percent specificity. It
ends in the upper-right corner, where 100 percent sensitivity intersects with
0 percent specificity. Most software also draws a diagonal straight line between
the lower-left and upper-right corners because that represents the formula:
sensitivity
specificity
1 –
. If your model’s ROC curve were to match that line, it
would indicate the total absence of any predicting ability at all of your model.
Like Figure 18-7, every ROC graph has sensitivity running up the Y axis, which is
displayed either as fractions between 0 and 1 or as percentages between 0 and 100.
The X axis is either presented from left to right as 1 – specificity, or like it is in
Figure 18-7, where specificity is labeled backwards — from right to left — along
the X axis.
Most ROC curves lie in the upper-left part of the graph area. The farther away
from the diagonal line they are, the better the predictive model is. For a nearly
perfect model, the ROC curve runs up along the Y axis from the lower-left corner
to the upper-left corner, then along the top of the graph from the upper-left cor-
ner to the upper-right corner.
Because of how sensitivity and specificity are calculated, the graph appears as a
series of steps. If you have a large data set, your graph will have more and smaller
steps. For clarity, we show the cut values for predicted probability as a scale along
the ROC curve itself in Figure 18-7, but unfortunately, most statistical software
doesn’t do this for you.
Looking at the ROC curve helps you choose a cut value that gives the best tradeoff
between sensitivity and specificity:»
» To have very few false positives: Choose a higher cut value to give a high
specificity. Figure 18-7 shows that by setting the cut value to 0.6, you can
simultaneously achieve about 93 percent specificity and 87 percent sensitivity.»
» To have very few false negatives: Choose a lower cut value to give higher
sensitivity. Figure 18-7 shows you that if you set the cut value to 0.3, you can
have almost perfect sensitivity because you’ll be at almost 100 percent, but
your specificity will be only about 75 percent, meaning you’ll have a 25 percent
false positive rate.
The software may optionally display the area under the ROC curve (abbreviated
AUC), along with its standard error and a p value. This is another measure of how
good the predictive model is. The diagonal line has an AUC of 0.5, and there is a
statistical test comparing your AUC to the diagonal line. Under α = 0.05, if the p
value < 0.05, it indicates that your model is statistically significantly better than
the diagonal line at accurately predicting your outcome.